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Abstract —We address the problem of minimizing the long-run 
expected average cost of a complex system consisting of inter¬ 
active subsystems. We formulate a multiobjective optimization 
problem of the one-stage expected costs of the subsystems and 
provide a duality framework to prove that the control policy 
yielding the Pareto optimal solution minimizes the average cost 
criterion of the system. We provide the conditions of existence and 
a geometric interpretation of the solution. For practical situations 
with constraints consistent to those studied here, our results imply 
that the Pareto control policy may be of value when we seek to 
derive online the optimal control policy in complex systems. 

Index Terms —Stochastic optimal control, multiobjective opti¬ 
mization, complex systems, Pareto control policy. 

I. Introduction 


A. Motivation 

Complex systems consist of diverse entities that interact 
both in space and time. Referring to something as complex 
implies that it consists of interdependent entities that are 
connected with each other and can adapt, i.e., they can respond 
to their local and global environment Q. Complex systems 
are encountered in many applications including sustainable 
transportation, fusion and other alternative energy strategies, 
and biological systems. For example, the US electricity grid is 
one of the world’s largest complex systems Q consisting of a 
dynamic collection of diverse, interacting components that can 
adapt. These components are also interdependent and operate 
under an enormous range of physical, reliability, economic, 
social, and political constraints that need to be satisfied over 
time scales ranging from seconds, for closed-loop control, 
to decades, for transmission siting and construction. Hybrid 
electric vehicles (HEVs) and plug-in HEVs is another complex 
system 0 consisting of various interdependent subsystems, 
e.g., the internal combustion engine, the electric machines 
(motor and generator), and the energy storage system (battery), 
that are connected and adapt appropriately to provide the 
power demanded by the driver. Another example of complex 
system is the hybrid distributed power generation system Q 
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consisting of wind turbines, photovoltaic generation, energy 
storage, and the relevant energy conversion control. 

Stochastic optimal control of complex systems is a ubiq¬ 
uitous task in engineering. The problem is formulated as se¬ 
quential decision-making under uncertainty where a controller 
is faced with the task to select control actions in several 
time steps to achieve long-term goals efficiently. While the 
nature of these problems may vary widely, their underlying 
structure is similar and has two principal features: a discrete¬ 
time dynamic system whose state evolves according to given 
transition probabilities that depend on a decision at each time 
and a cost function that is additive over time. The objective 
is to derive an optimal policy that minimizes the long-run 
expected average cost criterion. 

Mathematically, the average cost criterion is prominent as 
being complex to analyze compared to others; while other 
classical criteria lead to rational complete solutions, the long- 
run cost may not The average cost criterion in Markov 
chains with finite state and action spaces is well understood 
@-lT2). Dynamic programming (DP) | [T3) has been widely 
employed as the principal method for analysis of these prob¬ 
lems |T4)-|^. A significant amount of work has focused 
on inventory problems using linear programming ID, 
which has been widely used as an alternative to DP method 
l^-g. I’olicy iteration m lias been another method to 
address problems considering the average cost criterion ED, 

1321 by adjusting the policy of the system directly rather 
than using value iteration to derive it. Various other methods 
proposed in the literature have used matrix decomposition 
1, quadratic programming for multiple costs p^, learning 


algorithms p 
sensitive criterion 


36), decentralized methods p7) , and the risk- 

7 ^- 

Despite the significant progress in optimization and control 
methods within the last decades current techniques, in some 
instances, may be computationally impractical for online opti¬ 


mal control of large-scale complex systems |39|. One possible 
approach for ameliorating this difficulty is to develop the 
framework that exploits the structure of the system intercon¬ 
nections and narrow the range of acceptable solutions. 

In this paper, we seek to establish a rigorous framework for 
the analysis and stochastic optimization of complex systems 
that will permit online implementation of the optimal control 
policy with respect to the long-run expected average cost 
criterion. The contributions of this paper are (1) the devel¬ 
opment of a duality framework for the analysis and stochastic 
optimization of complex systems that can be used to derive the 
optimal control policy; (2) the formulation and solution of a 
multiobjective optimization problem of the one-stage expected 





costs of all interactive subsystems yielding an equilibrium 
operating point among the subsystems that minimizes the 
long-run expected average cost of the system; and (3) the 
geometric interpretation of the solution and the formation of 
the conditions under which the optimal control policy exists. 

The remainder of the paper proceeds as follows. In Section 
II, we introduce our notation and formulate the problem. 
In Section III, we develop a multi objective optimization 
framework to address the problem and introduce the Pareto 
control policy. In Section IV, we show that the Pareto control 
policy minimizes the long-run expected average cost criterion. 
Finally, we present illustrative examples in Section V and 
concluding remarks in Section VI. 

II. System Model and Problem Formulation 

A. Notation 

We denote random variables with upper case letters, and 
their realization with lower case letters, e.g., for a ran¬ 
dom variable X, x denotes its realization. Subscripts denote 
time, and subscripts in parentheses denote subsystems; for 
example, ^t(i) denotes the random variable of the subsys¬ 
tem i at time t, and its realization. The shorthand 
notation X^i.^ denotes the vector of random variables 
X(( 2 ), • • • and a;(i. 7 v) denotes the vector of 

their realization {x, x 1^2)r '' )a^(A))- IP(') is the transition 
probability matrix, and E[-] is the corresponding expectation 
of a random variable. For a control policy tt, we use P’^(-), 
and to denote that the transition probability matrix, 
expectation and stationary distribution depend on the choice 
of the control policy tt. 

B. The System Model 

We consider a system consisting of N subsystems. The 
subsystems interact with each other and their environment. At 
time t,t = 1,2, ••• ,T, the state of each subsystem 
takes values in a finite state space 5(i), which is a metric space. 
For each subsystem i, we also consider a finite control space 
which is also a metric space, from which control actions, 
[/((i), are chosen. 

The initial state of the system Xo( 1 :A) is a random variable 
taking values in the system’s state space, S — nti5(,).The 
evolution of the state is imposed by the discrete-time equation 

Xt + l(l-.N) = /(-^i(l:Af)j (1) 

where Wt(i:N) is the input from the environment. The system 
state can be completely observed. 

In our formulation, a state-dependent constraint is incor¬ 
porated; that is, for each realization of the state of the 
subsystem i, Xt(i) = X[i), there is a nonempty and closed 
set C{x(i)) := {u(^i')\Xt(^i'j = a;(i)} C U(^i) of feasible control 
actions when the system is in state X(^i). For each subsystem 
i, we denote the set of admissible state/action pairs 

r(*): = {(a;(i),M(i))|a:(i) e and M(i) e C(a;(i))}. (2) 


The set of admissible state/action pairs for the system is 

N 

T: =11^(1)= € S 

i=l 

and U(i:Ar) G C(a;(i: N))}, (3) 

where C{xi^i.,n)) = C(i)(a;(^))■ 

For each state of the system = a;(i:Ar), we define 

the functions fj, : S ^ 14, where 14 — 
the state space to the control action space defined as the 
control law. When the system is at state Xtf^i.^ = X(i:Ar), 
the controller chooses action according to the control law 

Definition 1: Each sequence of the functions /i is defined 
as a stationary control policy of the system 

tt: = (/x(l),p(2),--- ,At(|5|)), (4) 

where |5| is the cardinality of the system’s state space S. 

Let n denote the set of the collection of the stationary 
control policies 

H: = IttItt = (/x(l),Ai(2),--• ,/r(|5|)|. (5) 

The stationary control policy tt operates as follows. As¬ 
sociated with each state Xt(\.^ = X(i:]si) is the function 
t{x{i-.n)) € If at any time the controller finds 

the system in state a;(i: 7 v), then the controller always chooses 
the action based on the function /i(a:(i:Ar))- A stationary 
policy depends on the history of the process only through 
the current state, and thus to implement it, the controller only 
needs to know the current state of the system. The advantages 
for implementation of a stationary policy are apparent as 
it requires the storage of less information than required to 
implement a general policy. 

At each stage t, the controller observes the state of the 
system, = a;(i. 7 v) € S, and an action, Ut(i-N) = 

TiXtH : 7 V)), is realized from the feasible set of actions at 
that state. At the same stage t, an uncertainty, Wt(i-.N )7 is 
incorporated in the system. At the next stage, the system 
transits to the state Ar(_|_i(i. 7 v) = G S and a transition 

cost for each subsystem i , , where 

Ct{i) • ‘5(7) X C{x(i)) X S(i\ —)■ M, and for the system, 
Ct(-’ft-i-i(i:iv)l-’fi(i:Af), i5t(i:A)), where ct : S X C(a;(i.jv)) x 
5 —> M, are incurred. 

C. Assumptions 

In the model described above, we consider the following 
assumptions: 

(Al) There exists p such that the graph of p is included in 
F. 

(A2) The input from the uncertainty is a sequence 

of independent random variables, independent of the initial 
state -Vo(i:Ar), and takes values in the finite set W. 

(A3) For each stationary control policy tt, the Markov chain 
{^t(i:N)\t = ^,‘4, ■ ■ ■ } has a unique probability distribution 
(row vector). 



(A4) The one-stage expected cost of the system, -.T ^ 

M, 


Ct{Xt+l(l-.N) — ^(l:Ar)|-^t(l:Af) — 3;(1:7 V)i ^t(l:Ar))i 


is a continuous function of the one-stage costs of the subsys¬ 
tems and it is uniformly bounded. 

(A5) The control action realized at each subsystem doesn’t 
affect the transition probability matrix of the other subsystems. 

We briefly comment on the above assumptions. A1 en¬ 
sures that the set of the collection of the stationary control 
policies, If, is nonempty. A2 imposes a condition yield¬ 
ing that the state 2ft+i(i:Ar) depends only on and 

Ut{l:N)- Namely, the evolution of the state is a Markov 
chain 0. A3 implies that for each stationary policy tt S 
If, there is a unique probability distribution (row vec¬ 
tor) /3- = (/?(ir,/3(2r,--- ,/3(|5|)-), with 

EL=i Piky = 1 ||4^ p. 227] such that Under 

this assumption, it is known ED p. 175] that 


1 ^ 

lim - 

T->oo T -f 1 ^ 
t=o 


= 1-(3^ 


( 6 ) 


where P’^ is the transition probability matrix and 1 = 
[1,1, • • • ,1]^. A4 imposes that the interaction of the subsys¬ 
tems has an impact on one-stage expected cost of the system. 
Finally, A5 implies that the subsystems evolve independently. 


D. Problem Formulation 


We are concerned with deriving a stationary optimal control 
policy TT to minimize the long-run expected average cost of the 
system 


J(7r) = lim 


1 


T-)-oo T +\ 






( 7 ) 


Since for each control policy the Markov chain has a unique 
probability distribution (A3), it follows that the limit in Q 
exists. Substituting into Q shows that the long-run average 
cost, J(7r), does not depend on the initial state Xo(i:Ar) and 
is given simply as 


J(7r) = r • k\ 


( 8 ) 


that will implement the optimal control policy online while the 
subsystems interact with each other. The intention here is to 
identify an equilibrium operating point among the subsystems; 
if the systems operate at this equilibrium, then the average cost 
of the system will be minimized. 


III. Multiobjective Optimization Analysis 
A. Pareto Control Policy 

To identify an equilibrium operating point among the sub¬ 
systems we formulate a multiobjective optimization problem 
for the one-stage cost of the subsystems. Let’s consider the 
function /: T —)• 





( 10 ) 


where (^t(i:N)j C7t(i:Ar)) is the one-stage expected cost 
for each subsystem i and the following multiobjective opti¬ 
mization problem 


min 

Ut{i;jv)GC(a;(i.jv)) 





kti2){^t(l:N), Uti l-.N 


)). 



( 11 ) 


The result of the problem ( |TT] i is called Pareto efficiency. In 
a Pareto efficiency allocation among agents, no one can be 
made better without making at least one other agent worse. 
The following result provides the conditions that the Pareto 
efficiency exists. 

Proposition 1 | [42) : Let L be a nonempty and compact 
set, and the one-stage expected cost for each subsystem i, 
kt(i)i^t{i)iUt(i))'- r —>■ K, be lower semicontinuous for all 
i = 1, • • • ,N. Then the Pareto efficiency is not empty. 

In our problem, the set of admissible state/action pairs, L, 
is a nonempty compact set (Al). Furthermore, the one-stage 
expected cost for each subsystem i, [/((i)), is a 

continuous function (A4). Consequently, the Pareto efficiency 
exists. 

Definition 2: The Pareto control policy tt° is defined as the 
policy that yields the minimum one-stage expected cost of 
the system, at each realization of the 

system state Xt(i.jv) = 


where the the stationary probability distribution of the 
entire system and 

k^ = (^k^ (l, C/t(i:Ar)), kf (2, Ut(i:N)), ‘ ‘ ‘ k^ (|iS|, Ut(i-.N))^ , 

( 9 ) 

is the column vector of the system’s one-stage expected cost. 

Various methods that discussed in the Introduction can be 
used to solve 0 or 0 offline and derive the optimal control 
policy that minimizes the long-run expected average cost J of 
the system. In this paper, we seek the theoretical framework 


B. Impact of the Pareto Control Policy on the System ’s Ex¬ 
pected Cost 

To simplify notation, in the rest of the paper 
the one-stage expected cost of each subsystem i, 
C 4 (i. 7 v)), and the one-stage expected cost 
of the system, kf , incurred when the 

system operates under the control policy tt, will be denoted 
by and k^ respectively. 

Definition 3: In a system consisting of N interactive sub¬ 
systems, the group of subsystems whose expected costs are a 







decreasing function with respect to the cost of the system is 
defined as the minor group. 

Definition 4: In a system consisting of N interactive sub¬ 
systems, the group of subsystems whose expected costs are an 
increasing function with respect to the cost of the system is 
defined as the principal group. 

Without loss of generality, we assume that the minor group 
consists of the subsystems 1,2, - ■ ■ ,m,m G N, and the 
principal group consists of the subsystems m + 1, ■ ■ ■ , N. 
Thus, since the one-stage expected cost of the system is a 
function <5 of the one-stage cost of the subsystems (A4), 


kf — S , k^2) t ( 12 ) 

from Definition 3, for each subsystem i in the minor group and 
for any two control policies 7r,7r' G 11 such that k^^-^ < 
if we fix the one-stage cost of the other subsystems in both 
minor and principal groups we have 

= '^( ■ ■ ■ I • • ) — ' ■’K{i)G ■ ■) ■ (13) 

Similarly, from Definition 4, for each subsystem j in the 
principal group and for any two control policies tt, tt' G II 
such that if we fix the one-stage cost of the 

other subsystems in both minor and principal groups we have 

^ ‘ (14) 


1) Problem 1: We consider the special case where the 
system consists of N subsystems of a minor group only. 

Proposition 2: The solution of the following multiobjective 
optimization problem at each realization of the state A't(i.jv) = 
X(i.jv) yields the Pareto control policy of the system. 


max 




t(l)G 


H{N)) 


subject to G S. 


(15) 


Proof: Let £ C(a;(i:Ar)) be the solution of ( [T5| ) 

at each realization of the state under the 

control policy tt. Thus, if we operate the system under tt, then 
~ 1; ■ • • for all Tt' G n. Therefore from 
Definition 3 we have k^ < k^ , for all tt' G II, and hence 
from Definition 2 tt is the Pareto control policy. ■ 

2) Problem 2: We consider the special case where the 
system consists of TV subsystems of a principal group only. 

Proposition 3: The solution of the following multiobjective 
optimization problem at each realization of the state = 

X(i:jv) yields the Pareto control policy of the system. 


min 


i.K 


i(l): 






subject to -Vt(i.jv) G S. 


(16) 


Proof: Let £ C(a;(i:W)) be the solution of ( [T^ 

at each realization of the state = X(i:Ar) under the 

control policy tt. Thus, if we operate the system under tt, then 
A i = 1,... ,iV, for all tt' G II. Therefore from 
Definition 4 we have k^ < kf , for all tt' G II, and hence 
from Definition 2 tt is the Pareto control policy. ■ 


3) Problem 3: We consider the general case where the 
system consists of N subsystems of both a minor and principal 
group. 

In this case, to derive the Pareto control policy, we formulate 
the following optimization problem for the one-stage cost of 
the system 


mm 

Gt(l;JV)GC(3;(l;iV)) 


^(^ 4 ( 1)1 K{2)y 


I '*'4 


t(N)) 


(17) 


subject to G S. 


The Pareto control policy is derived by computing at each 
realization of the system state G S, 

the control action that yields the minimum one-stage 

expected cost of the system in ( [T7| ). 

IV. Duality Framework 
A. Geometric Framework for Duality Analysis 

We use a geometric framework from duality analysis, re- 
ferre d to as min common/max crossing point problems (see 
1431, p. 120), to show that the Pareto control policy is an 
optimal control policy that minimizes the long-run expected 
average cost of the system, and provide a geometric interpre¬ 
tation of the solution. 

The min common/max crossing point framework captures 
the most essential elements of duality by considering two 
geometric problems. Let’s consider a nonempty subset A of 
shown in Fig. Q. The axis 9 corresponds to R" and 
the axis ip corresponds to M. 

The first geometric problem, the min common point, seeks 
to find the minimum value p* of the subset A in axis. 
The second geometric problem, the max crossing point, seeks 
to find the nonvertical hyperplane that contains A in its 
corresponding upper closed half space and crosses p axis at a 
maximum point h*. 

Mathematically, the min common point problem can be 
written as 

min(^ (18) 

subject to : (0,(p) G A 

A nonvertical hyperplane in R”+^ is specified by its normal 
{v, 1) G R"+^, where v G R", and a scalar A G R as 




+ iy'9 = A. 


(19) 


Such a hyperplane crosses the (n-l-l)sf axis, p, at (0, A). The 
hyperplane contains A in its upper closed half plane if and 
only if for all (0, p) G A 

p + v'e> A. (20) 


Similarly 


inf {p + v'O^ > A. 
(e,¥>)GA •’ 


( 21 ) 


Thus the max crossing point problem can be written 
max inf {p + u'O} 


( 22 ) 



subject to : 1 / G K" 

The function b{i/) = inf {</? + i^'d} is the dual function. 

Definition 5: If {6, (p) belongs to the closure of A and for all 
{d,(p) G A, 9 + ly' ■ (p < 9 + i/' ■ (fi, we say that the hyperplane 
supports A at {9,ip). 



Fig. 1. Geometric framework for duality analysis. 

Proposition 4: (see |43j, p. 123) The max crossing point 
of the dual function is less than or equal to the min common 
point, namely b* < <p*. 

Proof: For all {9, p) G A and v G R" we have 

b(iy) = inf \p + iy'9} < inf |(p|. (23) 

(e,¥>)GA (o,vj)ga 

Taking the supremum over i/ G M", we have 

b* = sup inf {p + iy'9}<p*= sup inf {p\- 

i/GR" jyGR" (O.V3 )gA 

(24) 

■ 

B. Strong Duality of the Pareto Control Policy 

We want to investigate the impact of the Pareto control 
policy on the long-run expected average cost of the system. 
This will involve characterizing the solution of the Pareto 
control policy within a duality framework. We recall that 
is the column vector of the system’s one-stage expected 
cost for each state, - ,|5|, under the control policy 

TT = ,/r(|5|)), namely 

We formulate the following problem: 

minllA:’^-bM’" • g|| (25) 

TT^n 


where = P’’' — I, q G RI*^! such that M’"' • g > 0, 
and ,fi{\S\Y) is the probability 

distribution corresponding to the control policy tt. 

We refer to this problem as the primal problem, and we 
denote by Wk^ -b ■ q\\* its optimal value. The Lagrangian 
function of the above minimization problem is 

L{tt, v) = life" + M" • q\\ + (/3" • M") • (26) 

where ly G Rl*^! is the vector of the Lagrange multipliers. 

We use the min common/max crossing point framework 
described above to visualize the duality in ( |26] l. We consider 
the following set 

A:= {(/3’^-M’",||fc’"-bM’^-g|l|7ren}. (27) 

Lemma 1: The hyperplane with norm {y, 1) that passes 
through the vector (/?’’’ • M’’’, -b M’’’ • gjl) intercepts the 
vertical axis p at the value of L(7r, v). 

Proof: The hyperplane with norm [v, 1) that passes 
through (/?’’’ • M'’^, -b M’’' • q\\) satisfies 

ip + 0' . ly = \\k^ + M^ ■ q\\ + {f3^ ■M^)-iy = LY, v)- (28) 

■ 

Lemma 2: The hyperplane that passes through ||fc’’’-bM’’'-( 7 ||* 
supports A. 

Proof: From Lemma 1 we have 

-b 0' • ^ = -b M’" • g|| -b ■M^)-iy = X. (29) 

Since for each stationary control policy we have a unique 
probability distribution (A3), 

^ -Vj =0:^ 

= 0 . 

Thus for each control policy tt S If, the elements of the set A 
are located only on the axis p, and 

||fc"+M"-g|| =A. (31) 

Thus 

life’"* -bM’"* -gir =p* < Ilfc’^-bM’^-gll = A,V7r G U. (32) 

■ 

Theorem 1: The Pareto control policy 7r° is the optimal 
control policy that minimizes the long-run expected average 
cost criterion of the system, under the assumption (A3) and 
(A4). 

Proof: Let 

1 . ^ = -b M’" • g, Vtt G n, (33) 

where 1 = (l, 1,..., l)”^, and ijj^ G R. Recall that q G RI”^! 
such that M’^ • g > 0. 

Multiplying the above equation by P^ = 

(P(1)^,P(2)^,--- ,P(kY,--- Y(ISIY) from the left 


subject to M’"' = 0, 




we have 


- ■ q (34) 

= + j5^ ■ -1) ■ q (35) 

= (5^ -k^ + ■ q- I3^° ■ q (36) 

= P^ -k^ + P^ -q- P^ ■q = P^ -k^ (37) 


since = P’^ — I and P'^ = P'^ . So from (|^, ip'^ is the 

long-run expected average cost corresponding to the control 
policy TT. 

From the Definition 2 of the Pareto control policy 

k^° < k^, Vtt e n, (38) 

and since M’^ ■ q > 0, ( |38l l through ( [3^ can be written 

k^° <k^ + M^ ■q=l■^|;^, (39) 


I,--- ,iV defined by P(j)(X 4 +i(i) = = 

U(j)). Now consider that the system operates under the control 
policy TT. Then the transition probability matrix of the entire 
system satisfies 

P’" = Pfi) 0 Pfa) «) • • • «) P^AT). (44) 

Proposition 6 p^: Consider a controlled Markov chain 
with a unique probability distribution for each control policy tt 
(A 3) for the entire system and another one for each subsystem. 
Then the stationary probability of the entire system, P^, can be 
expressed as the Kronecker product of each stationary proba¬ 
bility of each corresponding subsystem i, PJ^^y i = 1, - ■ ■ ,N, 
i.e., 

P"^ = 0 P^2) C) • • • 0 P(N) ■ 


where is the long-run expected average cost corresponding 
to any control policy tt G If. Multiplying ( |39l l by P'^ from 
the left we have 

ip^ = P^ ■ k^ < , Vtt G If. (40) 

Thus the Pareto control policy is the optimal control policy 
that minimizes the long-run expected average cost. ■ 

Theorem 2: The Pareto control policy 7r° supports A. 
Proof: From Theorem 1 we have 

k'^° + M’"" • g < fc’" -f M’" • g, Vtt G n. (41) 

Hence 

Wk^”+^1^° ■ q\\ <\\k^+M^ ■ q\\. (42) 

and from Lemma 2 

life’"* -f • q\\* = Wk^” + • g||. (43) 

■ 

Corollary 1: There is no duality gap in ( |26l ), and thus the 
Pareto control policy 7r° yields the global optimal solution. 


B. A System with Subsystems of a Minor Group 

We consider a system of two interactive subsystems of a 
minor group ||4^, illustrated in Fig. 



Fig. 2. A System of two subsystems. 


V. Illustrative Examples 
A. Preliminary Results 

In this section, we provide some results that we need to use 
for the illustrative examples in the next subsection. We begin 
by recalling the Kronecker product and its properties (see 0, 

ED). 

Definition 6: If A is an m-by-n matrix and B is a p-by-q 
matrix, then the Kronecker product A 0 B is the mp-by-np 
block matrix 


A®B = 


The next proposition provides an expression of the transition 
probability of the entire system as a Kronecker product of the 
transition probabilities of each subsystem. 

Proposition 5 0 .• Consider N evolving subsystems 
with corresponding transition probability matrices P(j), i = 


Each subsystem has two states, i.e., 5(i) = {1,2}, and two 
control actions (7(i) = {a, &}. Thus the system has four states 

5 = 11,2,3,4} = { \ , I , I , I }, and 

there are sixteen control policies. The transition probability 
matrices associated with the control policies for the first 

subsystem are 


( 1 ) 


( 1 ) 


( 1 ) 


])7r^ 

( 1 ) 


])7r® 

( 1 ) 


' ( 1 ) 


( 1 ) 


aiiB 

1 

e ■ 

Ml) 

_ p7r“ _ 
- ^(1) - 

pTT^^ 

Ml) 

= 

0.9 0 
0.4 0 

CLjjiiB 


Ml) 

- ^(1) 

= 

1-1 

o o 

bO iOf 

0.1 ■ 
0.8 


Ai) ■ 

0.7 0.3 
0.2 0.8 


0.7 

0.4 


, and 


' ( 1 ) 


0.3 

0.6 

( 1 ) 

Ai) 


Similarly, the transition 


probability matrices for the second subsystem are 

0.5 0.5 


DTT^ 
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The output for each subsystem with respect to each control 
policy is given by four 2 x 2 matrices as we have two 
states and two actions for each subsystem. For the first 
subsystem corresponding to each control policy the output 


is given: Y. 
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. The 


The entry (3, 2) in the transition cost matrices 

corresponds to the costs incurred when the subsystem 1 
resides at state 2 and transits to state 1 while the subsystem 2 
resides at state 1 and transits to state 2 following the control 
policy TT^. 

Similar to the cost matrix, the transition probability matrix 
is also a 4 X 4 for the four states. When the system operates 
under the control policy tt^, the transition probability matrix 


is given from Proposition 5, i.e.. 
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We assume that 25% of the subsystem’s output goes to 
subsystem 2, i.e., = 0.25 • ^^(i) and also 43% percent 

of the subsystem’s output goes to subsystem 1 , i.e., zj:'^^'^ = 
0.43 • F(( 2 )- The input for each subsystem is Wt(i) = 15 and 
Wt{ 2 ) = 16 respectively. Furthermore, we assume that the 
transition cost for each subsystem is given by 


The one-stage expected cost, t/t(i: 2 )), of each 

subsystem i is a 4 x 1 vector, and the value of the element m 
is computed as follows: 
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For example, to compute the one-stage expected cost for 
subsystem 1 following the control policy tt^ we have 

^4) ("’^*( 1 - 2 )’ ^*( 1 : 2 )) 


respectively. The transition cost for the entire system is given 
by 
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The transition cost matrix for each subsystem and for the 
entire system is a 4 x 4 matrix since we have four states in total 
(two for each subsystem), and the cost depends on each state 
and control action. For example, if we want to compute the 
transition cost matrices for each subsystem, and for 

the system, C’"’\ when the system operates under the control 
policy 7 r\ substituting Wt(^i),Wt{ 2 ),Y^^y, 
in ( |4^ , ( |47l ), and ( |48] l yields 
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The stationary probability distribution is given by 
For example, the stationary distribution imposed 
by the control policy tt^, is /3'^ = 0 , 8 ^ 2 ) ~ 

[ 0.2707 0.3008 0.2030 0.2256 ]. Hence the average 
cost of subsystem 1 with respect to policy is given by 
(|^, J( 7 r) = = 2.5602. In a similar way we 

can compute the corresponding one-stage cost vectors and 



































probability distributions for the subsystems 1, 2, and the entire 
system for all 16 control policies. The average costs for the 
subsystems and the system corresponding to each control 
policy are summarized in Tables |I] |I^ and |I^ Each value in 
the table (reading the table row by row) corresponds to the 
long-run expected average cost for the control policies from 
TT^ to TT^®. We note that subsystem 1 reaches its minimum 
average cost Ji when the policy is used. For subsystem 
2, the optimal cost is attained with the policy tt^. Finally, for 
the entire system optimality occurs under the control policy 
TT^® which is the Pareto control policy as it corresponds to the 
Pareto efficiency one-stage expected cost for each subsystem. 

TABLE I 

Long-Run Average Costs for Subsystem 1 


2.5602 

2.6712 

2.6390 

2.7255 

2.0249 

2.1127 

2.0872 

2.1556 

1.8029 

1.8811 

1.8584 

1.9193 

1.6317 

1.7025 

1.6820 

1.7371 


where 

/= kff^2)(^t(l:2),Ut(l:2))^ , (52) 

and 


[ min 

min fcj( 2 )(^t(l: 2 ),t^i(l: 2 )) )■ (53) 
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We perform 1,000 replications and we observe in Fig.j^that 
the absolute difference between p(7r®) and p(7r*) is zero. This 
indicates that tt® yields in fact the strong Pareto solution for 
the one-stage expected costs. Furthermore, Fig. shows that 
min^rGn p(7r) = p(7r®), where tt® is such that J* = J(7r®). 
Hence, based on these synthetic data, one can conclude that 
the optimal control policy of the entire system is the Pareto 
control policy. 


TABLE II 

Long-Run Average Costs for Subsystem 2 


2.2511 

1.8617 

1.7921 

1.5235 

2.3194 

1.9182 

1.8464 

1.5697 

2.3102 

1.9106 

1.8391 

1.5634 

2.3383 

1.9338 

1.8615 

1.5825 


TABLE III 

Long-Run Average Costs for Entire System 


2.7557 

2.4427 

2.4607 

2.2307 

2.3801 

2.1328 

2.1522 

1.9695 

2.3178 

2.0876 

2.1108 

1.9398 

2.1821 

1.9746 

1.9977 

1.8431 


C. A System with Subsystems of a Minor Group with Varying 
Transition Probability and Cost Matrices 

In this example we use synthetic data to examine the Pareto 
control policy of the systems. First, we use DP to compute 
the optimal control policy, denoted by tt*, that minimizes the 
average cost of the entire system. We anticipate that the Pareto 
control policy tt® will yield the same result. 

Fet the subsystems’ inputs be IFt(i) = 15, Wt( 2 ) = 16 
as in the previous example. Next, a random output of each 
subsystem is considered. The total output of the first subsystem 
1 associated with action, a, is a matrix with random entries dis¬ 
tributed according to a uniform distribution, F (1,3). Similarly, 
the total output of the same subsystem with respect to action 
b, is a matrix with entries distributed according to F(8,10). 
For the second subsystem, the entries of the matrix associated 
with action a are independent and identically distributed (i.i.d.) 
y(2,4), and the ones associated with action b i.i.d. Y{9, 12). 

Next, let p* = p{'K*) be the map defined as 

(51) 



ip(ii )-p(ii°)i 





Fig. 3. Histograms of the difference between the average costs corresponding 
to the optimal and Pareto control policies tt* and 7r° respectively. 

D. Power Management Control of a Plybrid Electric Vehicle: 
A System with Subsystems of a Principal Group 

The results presented here have been used in the problem 
of optimizing online the power management control in a 
HEV pT) consisting of subsystems of a principal group. The 
Pareto control policy was validated through simulation and it 
was compared with the control policy derived offline by DP 
using the long-run expected average cost. Both control policies 
achieved the same cumulative fuel consumption as illustrated 
in Fig. 1^ demonstrating that the Pareto control policy is the 
optimal control policy with respect to the average cost criterion 
and can be implemented online. This work has been extended 
| |48l by considering the battery in the problem formulation 
in addition to the engine’s and motor’s efficiency that can 
provide insights on how to prioritize these objectives based 
on consumers’ needs and preferences. 


p" = ii/-rii, 

































Fig. 4. Cumulative fuel consumption and state of charge of the battery for a 
parallel hybrid electric vehicle using the control policy derived from dynamic 
programming and the Pareto control policy over the city-suburban heavy duty 
vehicle route driving cycle ( 43 . 

VI. CONCLUDING REMARKS 

In this paper, we established a framework for the analysis 
and stochastic optimization of complex systems consisting of 
interactive subsystems. We formulated the stochastic control 
problem as a multi objective optimization problem of the 
one-stage expected costs of the subsystems and developed 
a duality framework to prove that the Pareto control policy 
minimizes the long-run expected average cost criterion of the 
system. We provided a geometric interpretation of the solution 
and conditions for its existence. The Pareto control policy 
identifies an equilibrium operating point among the subsystem. 
If the system operates at this equilibrium, then the long-run 
expected average cost per unit time is minimized. For practical 
situations with constraints consistent to those studied here, our 
results imply that the Pareto control policy may be of value 
when we seek to derive online the optimal control policy in 
complex systems. 

One potential extension of this work could be to investigate 
whether a similar analysis can yield the desired emergence in a 
complex system from a decentralized perspective. Emergence 
refers to the spontaneous creation of order and functionality 
from the bottom up. Wherever we see complex systems in the 
physical world, we see emergent patterns at every level, both in 
structure and functionality. Emergence occurs without a central 
planner, from the bottom up, based on the interaction of the 
individual entities in a system. As a simple example from 
the natural world of how emergence arises, we can consider 
the flying patterns created by a flock of birds following three 
simple rules: 1) stay close but don’t bomb into birds around 
me, 2) fly as fast as birds near me, and 3) move towards the 
center of the group. The fact that a rule applied locally leads 
to a macro-level property is what is meant by the term bottom 
up. Another example of a bottom-up emergent phenomenon is 


the traffic jam resulting from a specific sequence of vehicle-to- 
vehicle and vehicle-to-infrastructure interactions. If we could 
develop the framework to characterize emergence, then we 
would be able to designate the rules for the interactions of the 
individual subsystems so that the desired emergent phenomena 
would occur. 
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